corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora

نویسندگان

  • Stephan Druskat
  • Volker Gast
  • Thomas Krause
  • Florian Zipser
چکیده

This paper introduces an open source, interoperable generic software tool set catering for the entire workflow of creation, migration, annotation, query and analysis of multi-layer linguistic corpora. It consists of four components: Salt, a graph-based meta model and API for linguistic data, the common data model for the rest of the tool set; Pepper, a conversion tool and platform for linguistic data that can be used to convert many different linguistic formats into each other; Atomic, an extensible, platform-independent multi-layer desktop annotation software for linguistic corpora; ANNIS, a search and visualization architecture for multi-layer linguistic corpora with many different visualizations and a powerful native query language. The set was designed to solve the following issues in a multi-layer corpus workflow: Lossless data transition between tools through a common data model generic enough to allow for a potentially unlimited number of different types of annotation, conversion capabilities for different linguistic formats to cater for the processing of data from different sources and/or with existing annotations, a high level of extensibility to enhance the sustainability of the whole tool set, analysis capabilities encompassing corpus and annotation query alongside multi-faceted visualizations of all annotation layers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interoperability of Corpora and Annotations

This paper describes the application of OWL and RDF to address the interoperability of linguistic corpora and linguistic annotations within such corpora. Interoperability of linguistic corpora involves two aspects: Structural interoperability (annotations of different origin are represented using the same formalism) and conceptual interoperability (annotations of different origin are linked to ...

متن کامل

A generic formalism to represent linguistic corpora in RDF and OWL/DL

This paper describes POWLA, a generic formalism to represent linguistic corpora by means of RDF and OWL/DL. Unlike earlier approaches in this direction, POWLA is not tied to a specific selection of annotation layers, but rather, it is designed to support any kind of text-oriented annotation. POWLA inherits its generic character from the underlying data model PAULA (Dipper, 2005; Chiarcos et al....

متن کامل

MULTEXT: Multilingual Text Tools and Corpora

MULTEXT (Multilingual Text Tools and Corpora) is the largest project funded in the Commission of European Communities Linguistic Research and Engineering Program. The project will contribute to the development of generally usable software tools to manipulate and analyse text corpora and to create multi-lingual text corpora with structural and linguistic markup. It will attempt to establish conv...

متن کامل

Multi-level annotation of linguistic data with MMAX2

This paper describes how richly annotated corpora can be created with the annotation tool MMAX2. The description is from the point of view of Computational Linguistics, a discipline where annotated corpora are often used as resources for software development. The paper outlines the important steps in the life cycle of an annotation and details how the tool MMAX2 can be employed in each of them.

متن کامل

Towards A Modular Data Model For Multi-Layer Annotated Corpora

In this paper we discuss the current methods in the representation of corpora annotated at multiple levels of linguistic organization (so-called multi-level or multi-layer corpora). Taking five approaches which are representative of the current practice in this area, we discuss the commonalities and differences between them focusing on the underlying data models. The goal of the paper is to ide...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016